首页> 外文OA文献 >Building a comprehensive syntactic and semantic corpus of Chinese clinical texts
【2h】

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

机译:构建全面的汉语句法语义语料库   临床文本

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Objective: To build a comprehensive corpus covering syntactic and semanticannotations of Chinese clinical texts with corresponding annotation guidelinesand methods as well as to develop tools trained on the annotated corpus, whichsupplies baselines for research on Chinese texts in the clinical domain. Materials and methods: An iterative annotation method was proposed to trainannotators and to develop annotation guidelines. Then, by using annotationquality assurance measures, a comprehensive corpus was built, containingannotations of part-of-speech (POS) tags, syntactic tags, entities, assertions,and relations. Inter-annotator agreement (IAA) was calculated to evaluate theannotation quality and a Chinese clinical text processing and informationextraction system (CCTPIES) was developed based on our annotated corpus. Results: The syntactic corpus consists of 138 Chinese clinical documents with47,424 tokens and 2553 full parsing trees, while the semantic corpus includes992 documents that annotated 39,511 entities with their assertions and 7695relations. IAA evaluation shows that this comprehensive corpus is of goodquality, and the system modules are effective. Discussion: The annotated corpus makes a considerable contribution to naturallanguage processing (NLP) research into Chinese texts in the clinical domain.However, this corpus has a number of limitations. Some additional types ofclinical text should be introduced to improve corpus coverage and activelearning methods should be utilized to promote annotation efficiency. Conclusions: In this study, several annotation guidelines and an annotationmethod for Chinese clinical texts were proposed, and a comprehensive corpuswith its NLP modules were constructed, providing a foundation for further studyof applying NLP techniques to Chinese texts in the clinical domain.
机译:目的:建立涵盖汉语临床文本句法和语义注释的综合语料库,并提供相应的注释准则和方法,并开发在注释语料库上训练的工具,为临床领域中文文本的研究提供基础。材料和方法:提出了一种迭代注释方法来训练注释者并制定注释准则。然后,通过使用注释质量保证措施,构建了一个综合语料库,其中包含词性(POS)标签,语法标签,实体,断言和关系的注释。计算了注释者间协议(IAA)以评估注释质量,并基于我们的注释语料库开发了中文临床文本处理和信息提取系统(CCTPIES)。结果:句法语料库由138个中文临床文献组成,具有47,424个标记和2553个完整的解析树,而语义语料库包括992个文献,这些文献用其断言和7695个关联注释了39,511个实体。 IAA评估表明,该综合语料库具有良好的质量,并且系统模块有效。讨论:带注释的语料库为自然语言处理(NLP)在临床领域对中文文本的研究做出了巨大贡献。但是,该语料库有很多局限性。应引入一些其他类型的临床文本以提高语料库的覆盖率,并应采用主动学习方法来提高注释效率。结论:本研究提出了针对中文临床文本的几种注释准则和注释方法,并构建了具有其NLP模块的综合语料库,为进一步研究将NLP技术应用于临床文本提供了基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号